首页> 外文OA文献 >Exploiting random projections and sparsity with random forests and gradient boosting methods - Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity
【2h】

Exploiting random projections and sparsity with random forests and gradient boosting methods - Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity

机译:利用随机森林和梯度提升方法开发随机投影和稀疏度-应用于多标签和多输出学习,随机森林模型压缩和利用输入稀疏度

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested ``if-then-else'' questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees trained independently in parallel, while tree boosting ensembles train decision trees sequentially to refine the predictions made by the previous ones.The emergence of new applications requires scalable supervised learning algorithms in terms of computational power and memory space with respect to the number of inputs, outputs, and observations without sacrificing accuracy. In this thesis, we identify three main areas where decision tree methods could be improved for which we provide and evaluate original algorithmic solutions: (i) learning over high dimensional output spaces, (ii) learning with large sample datasets and stringent memory constraints at prediction time and (iii) learning over high dimensional sparse input spaces.A first approach to solve learning tasks with a high dimensional output space, called binary relevance or single target, is to train one decision tree ensemble per output. However, it completely neglects the potential correlations existing between the outputs. An alternative approach called multi-output decision trees fits a single decision tree ensemble targeting simultaneously all the outputs, assuming that all outputs are correlated. Nevertheless, both approaches have (i) exactly the same computational complexity and (ii) target extreme output correlation structures. In our first contribution, we show how to combine random projection of the output space, a dimensionality reduction method, with the random forest algorithm decreasing the learning time complexity. The accuracy is preserved, and may even be improved by reaching a different bias-variance tradeoff. In our second contribution, we first formally adapt the gradient boosting ensemble method to multi-output supervised learning tasks such as multi-output regression and multi-label classification. We then propose to combine single random projections of the output space with gradient boosting on such tasks to adapt automatically to the output correlation structure.The random forest algorithm often generates large ensembles of complex models thanks to the availability of a large number of observations. However, the space complexity of such models, proportional to their total number of nodes, is often prohibitive, and therefore these modes are not well suited under stringent memory constraints at prediction time. In our third contribution, we propose to compress these ensembles by solving a L1-based regularization problem over the set of indicator functions defined by all their nodes.Some supervised learning tasks have a high dimensional but sparse input space, where each observation has only a few of the input variables that have non zero values. Standard decision tree implementations are not well adapted to treat sparse input spaces, unlike other supervised learning techniques such as support vector machines or linear models. In our fourth contribution, we show how to exploit algorithmically the input space sparsity within decision tree methods. Our implementation yields a significant speed up both on synthetic and real datasets, while leading to exactly the same model. It also reduces the required memory to grow such models by exploiting sparse instead of dense memory storage for the input matrix.
机译:在机器学习中,监督学习领域旨在根据过去对系统行为的观察来对系统的输入输出关系进行建模。决策树通过一系列嵌套的``if-then-else''问题(测试节点)来表征输入-输出关系,这些问题是测试节点,从而导致了一组预测(叶子节点)。通常将几种这样的树组合在一起以实现最新的性能:随机森林集合对并行独立训练的随机决策树的预测求平均,而树增强集合对序列决策树进行顺序训练以完善先前决策的预测新应用程序的出现要求在不牺牲准确性的前提下,就输入,输出和观察的数量而言,在计算能力和存储空间方面具有可扩展的监督学习算法。在本文中,我们确定了可以改进决策树方法的三个主要领域,我们为其提供和评估原始算法解决方案:(i)在高维输出空间上学习,(ii)在预测时使用大样本数据集和严格的内存约束进行学习时间和(iii)在高维稀疏输入空间上学习。解决具有高维输出空间的学习任务的第一种方法(称为二进制相关性或单个目标)是为每个输出训练一个决策树集合。但是,它完全忽略了输出之间存在的潜在关联。假设所有输出都相关,则称为多输出决策树的另一种方法适合同时针对所有输出的单个决策树集合。然而,这两种方法都具有(i)完全相同的计算复杂度和(ii)目标极端输出相关结构。在我们的第一篇论文中,我们展示了如何将输出空间的随机投影(一种降维方法)与随机森林算法相结合,以减少学习时间的复杂性。精度得以保留,甚至可以通过达到不同的偏差-方差折衷来提高精度。在我们的第二个贡献中,我们首先正式将梯度增强集成方法应用于多输出监督学习任务,例如多输出回归和多标签分类。然后我们建议将输出空间的单个随机投影与梯度提升结合起来以自动适应输出相关性结构。由于有大量观测值,随机森林算法通常会生成大型的复杂模型集合。但是,此类模型的空间复杂度与它们的节点总数成正比,通常是令人望而却步的,因此,在预测时,这些模式不太适合在严格的内存约束下。在我们的第三项贡献中,我们建议通过解决所有节点定义的指标函数集上的基于L1的正则化问题来压缩这些集合。一些监督学习任务具有高维但稀疏的输入空间,其中每个观察结果只有一个非零值的少数输入变量。与其他有监督的学习技术(例如支持向量机或线性模型)不同,标准决策树实现不适用于处理稀疏的输入空间。在我们的第四篇文章中,我们展示了如何在决策树方法中算法地利用输入空间稀疏性。我们的实现大大提高了合成数据集和真实数据集的速度,同时导致了完全相同的模型。通过为输入矩阵利用稀疏而不是密集的内存存储,它还减少了扩展此类模型所需的内存。

著录项

  • 作者

    Joly, Arnaud;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号